English/Arabic Cross Language Information Retrieval (CLIR) for Arabic OCR-Degraded Text

نویسنده

  • Tarek A. Elghazaly
چکیده

In this paper, a novel for Query Translation and Expansion for enabling English/Arabic CLIR for both normal and OCR-Degraded Arabic Text model has been proposed, implemented, and tested. First, an English/Arabic Word Collocations Dictionary has been established plus reproducing three English/Arabic Single Words Dictionaries. Second, a modern Arabic Corpus has been built. Third, a model for simulating the Arabic OCR errors has been proposed. Forth, a comprehensive model for Query Translation and expansion is proposed. The model translates the Query from English to Arabic detecting and translating collocations, translating single words and transliterating names. It solves the replacement ambiguity then it expands the Arabic Query to handle the expected Arabic OCR errors. The proposed model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion, it gave high degree of accuracy in handling OCR errors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic/English Cross Language Information Retrieval Using a Bilingual Dictionary

With the increase of multilingual information available online and the increase of non-native English speaker (Arabic users) browsing the Internet, it has become more important to have information retrieval systems that can carry the retrieval process across language boundaries that is, cross language information retrieval CLIR systems. The CLIR system responds to the user query in a comprehens...

متن کامل

Cross Language Information Retrieval Model For Discovering WSDL Documents Using Arabic Language Query

Web service discovery is the process of finding a suitable Web service for a given user’s query through analyzing the web service‘s WSDL content and finding the best match for the user’s query. The service query should be written in the same language of the WSDL, for example English. Cross Language Information Retrieval techniques does not exist in the web service discovery process. The absence...

متن کامل

1 On Bidirectional English - Arabic Search

In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine-Readable Dictionaries (MRD) and Machine Translation (MT) systems are important resources for query translation in CLIR. We investigate the use of MT systems and MRD to Arabic-English and English-Arabic CLIR. The translation ambiguity associated with these resources is ...

متن کامل

Building an Arabic Stemmer for Information Retrieval

In TREC 2002 the Berkeley group participated only in the English-Arabic cross-language retrieval (CLIR) track. One Arabic monolingual run and three English-Arabic cross-language runs were submitted. Our approach to the crosslanguage retrieval was to translate the English topics into Arabic using online English-Arabic machine translation systems. The four official runs are named as BKYMON, BKYCL...

متن کامل

Design, Construction and Validation of an Arabic-English Conceptual Interlingua for Cross-lingual Information Retrieval

This paper describes the issues involved in extending a trans-lingual lexicon, the TextWise Conceptual Interlingua (CI), with Arabic terms. The Conceptual Interlingua is based on the Princeton English WordNet (Fellbaum, 1998). It is a central component in the cross-lingual information retrieval (CLIR) system CINDOR (Conceptual INterlingua for DOcument Retrieval). Arabic has a rich morphological...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009